Irion County
PIBNet: a Physics-Inspired Boundary Network for Multiple Scattering Simulations
Marsal, Rémi, Chaillat, Stéphanie
The boundary element method (BEM) provides an efficient numerical framework for solving multiple scattering problems in unbounded homogeneous domains, since it reduces the discretization to the domain boundaries, thereby condensing the computational complexity. The procedure first consists in determining the solution trace on the boundaries of the domain by solving a boundary integral equation, after which the volumetric solution can be recovered at low computational cost with a boundary integral representation. As the first step of the BEM represents the main computational bottleneck, we introduce PIBNet, a learning-based approach designed to approximate the solution trace. The method leverages a physics-inspired graph-based strategy to model obstacles and their long-range interactions efficiently. Then, we introduce a novel multiscale graph neural network architecture for simulating the multiple scattering. To train and evaluate our network, we present a benchmark consisting of several datasets of different types of multiple scattering problems. The results indicate that our approach not only surpasses existing state-of-the-art learning-based methods on the considered tasks but also exhibits superior generalization to settings with an increased number of obstacles. github.com/ENSTA-U2IS-AI/pibnet
- Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)
- North America > United States > Texas > Kleberg County (0.04)
- North America > United States > Texas > Irion County (0.04)
- (3 more...)
Adding simple structure at inference improves Vision-Language Compositionality
Miranda, Imanol, Salaberria, Ander, Agirre, Eneko, Azkune, Gorka
Dual encoder Vision-Language Models (VLM) such as CLIP are widely used for image-text retrieval tasks. However, those models struggle with compositionality, showing a bag-of-words-like behavior that limits their retrieval performance. Many different training approaches have been proposed to improve the vision-language compositionality capabilities of those models. In comparison, inference-time techniques have received little attention. In this paper, we propose to add simple structure at inference, where, given an image and a caption: i) we divide the image into different smaller crops, ii) we extract text segments, capturing objects, attributes and relations, iii) using a VLM, we find the image crops that better align with text segments obtaining matches, and iv) we compute the final image-text similarity aggregating the individual similarities of the matches. Based on various popular dual encoder VLMs, we evaluate our approach in controlled and natural datasets for VL compositionality. We find that our approach consistently improves the performance of evaluated VLMs without any training, which shows the potential of inference-time techniques. The results are especially good for attribute-object binding as shown in the controlled dataset. As a result of an extensive analysis: i) we show that processing image crops is actually essential for the observed gains in performance, and ii) we identify specific areas to further improve inference-time approaches.
- Europe > Switzerland > Zürich > Zürich (0.14)
- North America > United States > Texas > Irion County (0.04)
- North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.86)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Statistical Management of the False Discovery Rate in Medical Instance Segmentation Based on Conformal Risk Control
Dai, Mengxia, Luo, Wenqian, Li, Tianyang
Instance segmentation plays a pivotal role in medical image analysis by enabling precise localization and delineation of lesions, tumors, and anatomical structures. Although deep learning models such as Mask R-CNN and BlendMask have achieved remarkable progress, their application in high-risk medical scenarios remains constrained by confidence calibration issues, which may lead to misdiagnosis. To address this challenge, we propose a robust quality control framework based on conformal prediction theory. This framework innovatively constructs a risk-aware dynamic threshold mechanism that adaptively adjusts segmentation decision boundaries according to clinical requirements.Specifically, we design a \textbf{calibration-aware loss function} that dynamically tunes the segmentation threshold based on a user-defined risk level $α$. Utilizing exchangeable calibration data, this method ensures that the expected FNR or FDR on test data remains below $α$ with high probability. The framework maintains compatibility with mainstream segmentation models (e.g., Mask R-CNN, BlendMask+ResNet-50-FPN) and datasets (PASCAL VOC format) without requiring architectural modifications. Empirical results demonstrate that we rigorously bound the FDR metric marginally over the test set via our developed calibration framework.
- Asia > China > Guangdong Province > Shenzhen (0.04)
- North America > United States > Texas > Irion County (0.04)
DM-OSVP++: One-Shot View Planning Using 3D Diffusion Models for Active RGB-Based Object Reconstruction
Pan, Sicong, Jin, Liren, Huang, Xuying, Stachniss, Cyrill, Popović, Marija, Bennewitz, Maren
Many autonomous robotic applications depend on accurate 3D models of objects to perform downstream tasks. These include object manipulation in household scenarios (Breyer et al. 2022; Dengler et al. 2023; Jauhri et al. 2024), harvesting and prediction of intervention actions in agriculture (Pan et al. 2023; Lenz et al. 2024; Y ao et al. 2024), as well as solving jigsaw puzzles of fragmented frescoes in archaeology (Tsesmelis et al. 2024). For these applications, high-fidelity 3D object representations are critical to enable precise action execution and informed decision-making. When deployed in initially unknown environments, robots are often required to autonomously reconstruct 3D models of objects to understand their geometries, textures, positions, and orientations before taking action. Generating these models typically involves capturing data from multiple viewpoints using onboard sensors such as RGB or depth cameras. Data acquisition solely following predefined or randomly chosen sensor viewpoints is inefficient, as these approaches fail to adapt to the geometry and spatial distribution of the object to be reconstructed. This can lead to inferior reconstruction results, especially when objects are complex and contain self-occlusions. To address this, we propose using active reconstruction strategies, where object-specific sensor viewpoints are planned for data acquisition to achieve high-quality 3D object reconstruction. The key aspect of active reconstruction is view planning for generating viewpoints (Zeng et al. 2020a) that enables the robot to acquire the most informative sensor measurements.
- Europe > Germany > North Rhine-Westphalia > Cologne Region > Bonn (0.04)
- North America > United States > Texas > Irion County (0.04)
- Europe > Portugal (0.04)
- Europe > Netherlands > South Holland > Delft (0.04)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
Difficulty Modelling in Mobile Puzzle Games: An Empirical Study on Different Methods to Combine Player Analytics and Simulated Data
Kristensen, Jeppe Theiss, Burelli, Paolo
Difficulty is one of the key drivers of player engagement and it is often one of the aspects that designers tweak most to optimise the player experience; operationalising it is, therefore, a crucial task for game development studios. A common practice consists of creating metrics out of data collected by player interactions with the content; however, this allows for estimation only after the content is released and does not consider the characteristics of potential future players. In this article, we present a number of potential solutions for the estimation of difficulty under such conditions, and we showcase the results of a comparative study intended to understand which method and which types of data perform better in different scenarios. The results reveal that models trained on a combination of cohort statistics and simulated data produce the most accurate estimations of difficulty in all scenarios. Furthermore, among these models, artificial neural networks show the most consistent results.
- Europe > Denmark > Capital Region > Copenhagen (0.04)
- Oceania > Australia > Western Australia > Perth (0.04)
- North America > United States > Texas > Irion County (0.04)
- (4 more...)
Unbiased Filtering Of Accidental Clicks in Verizon Media Native Advertising
Kaplan, Yohay, Krasne, Naama, Shtoff, Alex, Somekh, Oren
Verizon Media (VZM) native advertising is one of VZM largest and fastest growing businesses, reaching a run-rate of several hundred million USDs in the past year. Driving the VZM native models that are used to predict event probabilities, such as click and conversion probabilities, is OFFSET - a feature enhanced collaborative-filtering based event-prediction algorithm. In this work we focus on the challenge of predicting click-through rates (CTR) when we are aware that some of the clicks have short dwell-time and are defined as accidental clicks. An accidental click implies little affinity between the user and the ad, so predicting that similar users will click on the ad is inaccurate. Therefore, it may be beneficial to remove clicks with dwell-time lower than a predefined threshold from the training set. However, we cannot ignore these positive events, as filtering these will cause the model to under predict. Previous approaches have tried to apply filtering and then adding corrective biases to the CTR predictions, but did not yield revenue lifts and therefore were not adopted. In this work, we present a new approach where the positive weight of the accidental clicks is distributed among all of the negative events (skips), based on their likelihood of causing accidental clicks, as predicted by an auxiliary model. These likelihoods are taken as the correct labels of the negative events, shifting our training from using only binary labels and adopting a binary cross-entropy loss function in our training process. After showing offline performance improvements, the modified model was tested online serving VZM native users, and provided 1.18% revenue lift over the production model which is agnostic to accidental clicks.
- Asia > Middle East > Israel > Haifa District > Haifa (0.04)
- Oceania > Australia (0.04)
- North America > United States > Texas > Irion County (0.04)
- (3 more...)
- Information Technology > Networks (0.61)
- Information Technology > Services (0.46)
FLIP: Towards Fine-grained Alignment between ID-based Models and Pretrained Language Models for CTR Prediction
Wang, Hangyu, Lin, Jianghao, Li, Xiangyang, Chen, Bo, Zhu, Chenxu, Tang, Ruiming, Zhang, Weinan, Yu, Yong
Click-through rate (CTR) prediction plays as a core function module in various personalized online services. The traditional ID-based models for CTR prediction take as inputs the one-hot encoded ID features of tabular modality, which capture the collaborative signals via feature interaction modeling. But the one-hot encoding discards the semantic information conceived in the original feature texts. Recently, the emergence of Pretrained Language Models (PLMs) has given rise to another paradigm, which takes as inputs the sentences of textual modality obtained by hard prompt templates and adopts PLMs to extract the semantic knowledge. However, PLMs generally tokenize the input text data into subword tokens and ignore field-wise collaborative signals. Therefore, these two lines of research focus on different characteristics of the same input data (i.e., textual and tabular modalities), forming a distinct complementary relationship with each other. In this paper, we propose to conduct Fine-grained feature-level ALignment between ID-based Models and Pretrained Language Models (FLIP) for CTR prediction. We design a novel joint reconstruction pretraining task for both masked language and tabular modeling. Specifically, the masked data of one modality (i.e., tokens or features) has to be recovered with the help of the other modality, which establishes the feature-level interaction and alignment via sufficient mutual information extraction between dual modalities. Moreover, we propose to jointly finetune the ID-based model and PLM for downstream CTR prediction tasks, thus achieving superior performance by combining the advantages of both models. Extensive experiments on three real-world datasets demonstrate that FLIP outperforms SOTA baselines, and is highly compatible for various ID-based models and PLMs.
- North America > United States > Michigan > Isabella County (0.24)
- Asia > China > Shanghai > Shanghai (0.05)
- Asia > China > Guangdong Province > Shenzhen (0.04)
- North America > United States > Texas > Irion County (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
Self-optimizing Feature Generation via Categorical Hashing Representation and Hierarchical Reinforcement Crossing
Ying, Wangyang, Wang, Dongjie, Liu, Kunpeng, Sun, Leilei, Fu, Yanjie
Feature generation aims to generate new and meaningful features to create a discriminative representation space.A generated feature is meaningful when the generated feature is from a feature pair with inherent feature interaction. In the real world, experienced data scientists can identify potentially useful feature-feature interactions, and generate meaningful dimensions from an exponentially large search space, in an optimal crossing form over an optimal generation path. But, machines have limited human-like abilities.We generalize such learning tasks as self-optimizing feature generation. Self-optimizing feature generation imposes several under-addressed challenges on existing systems: meaningful, robust, and efficient generation. To tackle these challenges, we propose a principled and generic representation-crossing framework to solve self-optimizing feature generation.To achieve hashing representation, we propose a three-step approach: feature discretization, feature hashing, and descriptive summarization. To achieve reinforcement crossing, we develop a hierarchical reinforcement feature crossing approach.We present extensive experimental results to demonstrate the effectiveness and efficiency of the proposed method. The code is available at https://github.com/yingwangyang/HRC_feature_cross.git.
- North America > United States > Texas > Irion County (0.04)
- North America > United States > Florida > Orange County > Orlando (0.04)
- North America > United States > Arizona (0.04)
- Asia > China > Beijing > Beijing (0.04)
MultiSChuBERT: Effective Multimodal Fusion for Scholarly Document Quality Prediction
Wenniger, Gideon Maillette de Buy, van Dongen, Thomas, Schomaker, Lambert
Automatic assessment of the quality of scholarly documents is a difficult task with high potential impact. Multimodality, in particular the addition of visual information next to text, has been shown to improve the performance on scholarly document quality prediction (SDQP) tasks. We propose the multimodal predictive model MultiSChuBERT. It combines a textual model based on chunking full paper text and aggregating computed BERT chunk-encodings (SChuBERT), with a visual model based on Inception V3.Our work contributes to the current state-of-the-art in SDQP in three ways. First, we show that the method of combining visual and textual embeddings can substantially influence the results. Second, we demonstrate that gradual-unfreezing of the weights of the visual sub-model, reduces its tendency to ovefit the data, improving results. Third, we show the retained benefit of multimodality when replacing standard BERT$_{\textrm{BASE}}$ embeddings with more recent state-of-the-art text embedding models. Using BERT$_{\textrm{BASE}}$ embeddings, on the (log) number of citations prediction task with the ACL-BiblioMetry dataset, our MultiSChuBERT (text+visual) model obtains an $R^{2}$ score of 0.454 compared to 0.432 for the SChuBERT (text only) model. Similar improvements are obtained on the PeerRead accept/reject prediction task. In our experiments using SciBERT, scincl, SPECTER and SPECTER2.0 embeddings, we show that each of these tailored embeddings adds further improvements over the standard BERT$_{\textrm{BASE}}$ embeddings, with the SPECTER2.0 embeddings performing best.
- Europe > Netherlands (0.04)
- Oceania > Australia (0.04)
- North America > United States > Texas > Irion County (0.04)
- (4 more...)
SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient
Ryabinin, Max, Dettmers, Tim, Diskin, Michael, Borzunov, Alexander
Many deep learning applications benefit from using large models with billions of parameters. Training these models is notoriously expensive due to the need for specialized HPC clusters. In this work, we consider alternative setups for training large models: using cheap "preemptible" instances or pooling existing resources from multiple regions. We analyze the performance of existing model-parallel algorithms in these conditions and find configurations where training larger models becomes less communication-intensive. Based on these findings, we propose SWARM parallelism, a model-parallel training algorithm designed for poorly connected, heterogeneous and unreliable devices. SWARM creates temporary randomized pipelines between nodes that are rebalanced in case of failure. We empirically validate our findings and compare SWARM parallelism with existing large-scale training approaches. Finally, we combine our insights with compression strategies to train a large Transformer language model with 1B shared parameters (approximately 13B before sharing) on preemptible T4 GPUs with less than 200Mb/s network.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- (23 more...)